Workload Dependent Hadoop MapReduce Application Performance Modeling
نویسندگان
چکیده
In any distributed computing environment, performance optimization, job runtime predictions, or capacity and scalability quantification studies are considered as being rather complex, time-consuming and expensive while the results are normally rather error-prone. Based on the nature of the Hadoop MapReduce framework, many MapReduce production applications are executed against varying data-set sizes [5]. Hence, one pressing question of any Hadoop MapReduce setup is how to quantify the job completion time based on the varying data-set sizes and the physical and logical cluster resources at hand. Further, if the job completion time is not meeting the goals and objectives, does any Hadoop tuning or cluster resource adjustments result into altering the job execution time to actually meet the required SLA's.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملTowards Energy Efficient MapReduce
Energy considerations are important for Internet datacenters operators, and MapReduce is a common Internet datacenter application. In this work, we use the energy efficiency of MapReduce as a new perspective for increasing Internet datacenter productivity. We offer a framework to analyze software energy efficiency in general, and MapReduce energy efficiency in particular. We characterize the pe...
متن کاملMRBS: A Comprehensive MapReduce Benchmark Suite
MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapRed...
متن کاملOptimization of Workload Prediction Based on Map Reduce Frame Work in a Cloud System
Nowadays cloud computing is emerging Technology. It is used to access anytime and anywhere through the internet. Hadoop is an open-source Cloud computing environment that implements the Googletm MapReduce framework. Hadoop is a framework for distributed processing of large datasets across large clusters of computers. This paper proposes the workload of jobs in clusters mode using Hadoop. MapRed...
متن کاملHadoop Performance Tuning - A Pragmatic & Iterative Approach
Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. In general, workload dependent Hadoop performance optimization efforts have to focus on 3 major categories. Namely the systems HW, the systems SW, and the configuration and tuning/optimization of the Hadoop infrastructure components. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013